Goto

Collaborating Authors

 line show



A Proofs

Neural Information Processing Systems

First, we recall basic properties of convex conjugate functions that we rely on in our proofs. The latter condition holds, e.g., for strongly convex functions. For each n = 1, 2,...,N we perform the following evaluation: ξ W First, we prove the congruence, i.e., βψ We provide the hyperparameters of all the experiments with algorithm 1 in Table 3. We use Adam optimizer with the default betas. Gaussian case, we use a single GPU GTX 1080ti.



Rectified Noise: A Generative Model Using Positive-incentive Noise

Gu, Zhenyu, Xu, Yanchen, Huang, Sida, Guo, Yubin, Zhang, Hongyuan

arXiv.org Artificial Intelligence

Rectified Flow (RF) has been widely used as an effective generative model. Although RF is primarily based on probability flow Ordinary Differential Equations (ODE), recent studies have shown that injecting noise through reverse-time Stochastic Differential Equations (SDE) for sampling can achieve superior generative performance. Inspired by Positive-incentive Noise (pi-noise), we propose an innovative generative algorithm to train pi-noise generators, namely Rectified Noise (RN), which improves the generative performance by injecting pi-noise into the velocity field of pre-trained RF models. After introducing the Rectified Noise pipeline, pre-trained RF models can be efficiently transformed into pi-noise generators. We validate Rectified Noise by conducting extensive experiments across various model architectures on different datasets. Notably, we find that: (1) RF models using Rectified Noise reduce FID from 10.16 to 9.05 on ImageNet-1k. (2) The models of pi-noise generators achieve improved performance with only 0.39% additional training parameters.


Enhancing Fractional Gradient Descent with Learned Optimizers

Sobotka, Jan, Šimánek, Petr, Kordík, Pavel

arXiv.org Machine Learning

Fractional Gradient Descent (FGD) offers a novel and promising way to accelerate optimization by incorporating fractional calculus into machine learning. Although FGD has shown encouraging initial results across various optimization tasks, it faces significant challenges with convergence behavior and hyperparameter selection. Moreover, the impact of its hyperparameters is not fully understood, and scheduling them is particularly difficult in non-convex settings such as neural network training. To address these issues, we propose a novel approach called Learning to Optimize Caputo Fractional Gradient Descent (L2O-CFGD), which meta-learns how to dynamically tune the hyperparameters of Caputo FGD (CFGD). Our method's meta-learned schedule outperforms CFGD with static hyperparameters found through an extensive search and, in some tasks, achieves performance comparable to a fully black-box meta-learned optimizer. L2O-CFGD can thus serve as a powerful tool for researchers to identify high-performing hyperparameters and gain insights on how to leverage the history-dependence of the fractional differential in optimization.




Stay Focused: Problem Drift in Multi-Agent Debate

Becker, Jonas, Kaesberg, Lars Benedikt, Stephan, Andreas, Wahle, Jan Philip, Ruas, Terry, Gipp, Bela

arXiv.org Artificial Intelligence

Multi-agent debate - multiple instances of large language models discussing problems in turn-based interaction - has shown promise for solving knowledge and reasoning tasks. However, these methods show limitations, particularly when scaling them to longer reasoning chains. In this study, we unveil a new issue of multi-agent debate: discussions drift away from the initial problem over multiple turns. We define this phenomenon as problem drift and quantify its presence across ten tasks (i.e., three generative, three knowledge, three reasoning, and one instruction-following task). To identify the reasons for this issue, we perform a human study with eight experts on discussions suffering from problem drift, who find the most common issues are a lack of progress (35% of cases), low-quality feedback (26% of cases), and a lack of clarity (25% of cases). To systematically address the issue of problem drift, we propose DRIFTJudge, a method based on LLM-as-a-judge, to detect problem drift at test-time. We further propose DRIFTPolicy, a method to mitigate 31% of problem drift cases. Our study can be seen as a first step to understanding a key limitation of multi-agent debate, highlighting pathways for improving their effectiveness in the future.


Feature-Specific Coefficients of Determination in Tree Ensembles

Jiang, Zhongli, Zhang, Dabao, Zhang, Min

arXiv.org Machine Learning

Tree ensemble methods provide promising predictions with models difficult to interpret. Recent introduction of Shapley values for individualized feature contributions, accompanied with several fast computing algorithms for predicted values, shows intriguing results. However, individualizing coefficients of determination, aka $R^2$, for each feature is challenged by the underlying quadratic losses, although these coefficients allow us to comparatively assess single feature's contribution to tree ensembles. Here we propose an efficient algorithm, Q-SHAP, that reduces the computational complexity to polynomial time when calculating Shapley values related to quadratic losses. Our extensive simulation studies demonstrate that this approach not only enhances computational efficiency but also improves estimation accuracy of feature-specific coefficients of determination.


Neural Network Compression for Reinforcement Learning Tasks

Ivanov, Dmitry A., Larionov, Denis A., Maslennikov, Oleg V., Voevodin, Vladimir V.

arXiv.org Artificial Intelligence

In the last decade, neural networks (NNs) have driven significant progress across various fields, notably in deep reinforcement learning, highlighted by studies like [1, 2, 3]. This progress has the potential to make changes in many areas such as embedded devices, IoT and Robotics. Although modern Deep Learning models have demonstrated impressive gains in accuracy, their large sizes pose limits to their practical use in many real-world applications [4]. These applications may impose requirements in energy consumption, inference latency, inference throughput, memory footprint, real-time inference and hardware costs. Numerous studies have attempted to make neural networks more efficient.